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Abstract 


Our goal is to provide a top-down approach to biomolecular com- 
putation. In spite of widespread discussion about connections between 
biology and computation, one question seems notable by its absence: 
Where are the programs? We identify a number of common features 
in programming that seem conspicuously absent from the literature on 
biomolecular computing; to partially redress this absence, we introduce 
a model of computation that is evidently programmable, by programs 
reminiscent of low-level computer machine code; and at the same time 
biologically plausible: its functioning is defined by a single and relatively 
small set of chemical-like reaction rules. Further properties: the model 
is stored-program: programs are the same as data, so programs are 
not only executable, but are also compilable and interpretable. It is 
universal: all computable functions can be computed (in natural ways 
and without arcane encodings of data and algorithm); it is also uniform: 
new “hardware” is not needed to solve new problems; and (last but 
not least) it is Turing complete in a strong sense: a universal algorithm 
exists, that is able to execute any program, and is not asymptotically 
inefficient. 


1 Biochemical Universality and Programming 


It has been known for some time that various forms of biomolecular compu- 
tation are Turing complete [8, 9, 11, 13, 39, 44, 47, 48]. The net effect is to 
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show that any computable function can be computed, in some appropriate 
sense, by an instance of the biological mechanism being studied. However, 
the arguments for Turing universality we have seen are less than compelling 
from a programming perspective. 

This paper (a journal version of [25]) opens new perspectives on speci- 
fying computation at the biological level. Its purpose is to provide a better 
computation model where the concept of “program” is clearly visible and 
natural, and in which Turing completeness is not artificial, but rather a 
natural part of biomolecular computation. We describe a prototype model 
that has been implemented (in silico on a conventional computer); a self- 
interpreter for the computation model; a powerful visualiser; and prove 
that the computational model can naturally simulate an arbitrary Turing 
machine. 

We begin by evaluating some established results on biomolecular com- 
putational completeness from a programming perspective; and then construc- 
tively provide an alternative solution. The new model seems biologically 
plausible, and usable for solving a variety of problems of computational as 
well as biological interest. 


The central question: can program execution take place in a biolog- 
ical context? Evidence for “yes” includes many analogies between biological 
processes and the world of programs: program-like behavior, e.g., genes that 
direct protein fabrication; “switching on” and “switching off”; processes; and 
reproduction. 

A clarification from the start: this paper takes a synthetic viewpoint, 
concerned with building things as in the engineering and computer sciences. 
This is in contrast to the ubiquitous analytic viewpoint common to the 
natural sciences, concerned with finding out how naturally evolved things 
work. 

The authors’ backgrounds lie in the semantics of programming languages, 
compilers, and computability and complexity theory; and admittedly not 
biology. We focus on the synthetic question can, rather than the usual 
natural scientists’ analytical question does. 


Where are the programs? In existing biomolecular computation 
models it is very hard to see anything like a program that realises or directs 
a computational process. For instance, in cellular automata the program is 
expressed only in the initial cell configuration, or in the global transition 
function. In many biocomputation papers the authors, given a problem, 
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cleverly devise a biomolecular system that can solve this particular problem. 
However, the algorithm being implemented is hidden in the details of the 
system’s construction, and hard to see, so the program or algorithm is in no 
sense a “first-class citizen”. Our purpose is to fill this gap, to establish a 
biologically feasible framework in which programs are first-class citizens. 


2 Relation to Other Computational Frameworks 


We put our contributions in context by quickly summarising some other 
computational completeness frameworks. Key dimensions: uniformity; 
programmability; efficiency; simplicity; universality; and biological plausi- 
bility. (Not every model is discussed from every dimension, e.g., a model 
weak on a dimension early in the list need not be considered for biological 
plausibility.) 

Circuits, BDDs, finite automata. While well proven in engineering 
practice, these models don’t satisfy our goal of computational completeness. 
The reason: they are non-uniform and so not Turing complete. Any single 
instance of a circuit or a BDD or a finite automaton has a control space 
and memory that are both finite. Consequently, any general but unbounded 
computational problem (e.g., multiplying two arbitrarily large integers) must 
be done by choosing one among an infinite family of circuits, BDDs or 
automata. 

The Turing machine. Strong points. Highly successful for theoretical 
purposes, the Turing model is uniform; there exists a clear concept of 
“program”; and the “universal Turing machine” from 1936 is the seminal 
example of a self-interpreter. The Turing model has fruitfully been used 
to study computational complexity problem classes as small as PTIME and 
LOGSPACE. 

Weak points. Turing machines do not well model computation times 
small enough to be realistically interesting, e.g., near-linear time. The inbuilt 
“data transport” problems due to the model’s one-dimensional tape (or tapes, 
on a multi-tape variant) mean that naturally efficient algorithms may be 
difficult to program on a Turing machine. E.g., a time O(n) algorithm 
may suffer asymptotic slowdown when implemented on a Turing machine, 
e.g., forced to run in time O(n”) because of architectural limitations. A 
universal Turing machine has essentially the same problem: it typically 
runs quadratically slower than the program it is simulating. Still greater 
slowdowns may occur if one uses smaller Turing complete languages, for 
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instance the counter or Minsky register machines as used in [8, 9, 13, 33]. 


Other computation models with an explicit concept of pro- 
gram. Numerous alternatives to the Turing machine have been developed, 
e.g., the Tag systems studied by Post and Minsky, and a variety of register 
or counter machines. Closer to computer science are recursive functions; the 
A-calculus; functional programming languages such as LISP; and machines 
with randomly addressable memories including the RAM and, most relevant 
to our work, its stored-program variant the RASP [29]. These models rate 
well on some of the key dimensions listed above. However they are rather 
complex; and were certainly not designed with biological plausibility in mind. 


Cellular automata. John von Neumann’s groundbreaking work 
on cellular automata was done in the 1940s, at around the time he also 
invented today’s digital computer. In [44] computational completeness was 
established by showing that any Turing machine could be simulated by a 
cellular automaton. Further, it was painstakingly and convincingly argued 
that a cellular automaton could achieve self-reproduction. Von Neumann’s 
and subsequent cellular automaton models, e.g., LIFE and Wolfram’s models 
(21, 9, 47], have some shortcomings, though. Though recent advances have 
remedied the lack of asynchronous computations [35], a second, serious 
drawback is the lack of programmability: once the global transition function 
has been selected (e.g., there is only one such in LIFE) there is little more 
that the user of the system can do; the only degree of freedom remaining is 
to choose the initial configuration of cell states. There is no explicit concept 
of a program that can be devised by the user. Rather, any algorithmic 
ideas have to be encoded in a highly indirect manner, into either the global 
transition function or into the initial cell state configuration. In a sense, the 
initial state of a universal CA represents both the program to be simulated, 
and its input; but in the zoo of cellular automata proven to be universal, 
there seems to be no standard way to identify which parts of the initial state 
correspond to, say, a certain control structure in a program, or a specific 
substructure of a data structure such as a list. 


Biomolecular computation frameworks. We will see that the 
Turing-typical asymptotic slowdowns can be avoided while using a biomolec- 
ular computing model. This provides an advance over both earlier work on 
automata-based computation models (Turing machines, counter machines, 
etc.), and over some other approaches to biomolecular computing 


A number of contributions exist in this area; a non-exhaustive list: 
(1, 3, 8, 11, 9, 12, 13, 24, 31, 32, 39, 40, 45, 46, 5, 48] The list is rather mixed: 
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Several of the articles describe concrete finite-automaton-like computations, 
emphasising their realisation in actual biochemical laboratory contexts. As 
such their emphasis is not on general computations but rather on showing 
feasibility of specific computations in the laboratory. Articles [8, 9, 13, 31, 48] 
directly address Turing completeness, but the algorithmic or programming 
aspects are not easy to see. 

How our approach is different: Contrary to several existing models, 
our atomic notion (the “blob”) carries a fixed amount of data and has a 
fixed number of possible interaction points with other blobs. Further, one 
fixed set of rules specify how local collections of blobs are changed. In this 
sense, our setup resembles specific cellular automata, e.g. Conway’s game 
of life where only the initial state may vary. Contrary to cellular automata, 
both programs and data are clearly identified ensembles of blobs. Further, 
we use a textual representation of programs closely resembling machine code 
such that each line essentially corresponds to a single blob instruction with 
parameters and bonds. The resulting code conforms closely to traditional 
low-level programming concepts, including use of conditionals and jumps. 

Outline of the paper: Section 3 introduces some notation to describe 
program execution. Section 4 has more discussion of computational com- 
pleteness Section 5 concerns the blob model of computation, with an explicit 
program component. Section 6 relates the blob model to more traditional 
computation models, and Section 7 describes prototype implementations of 
the blob model, including a sophisticated visualiser. Section 8 concludes. 
Appendix A shows how a Turing machine may be simulated in the blob 
model—possible with a constant slowdown because of the flexibility of blobs 
when considered as data structures. 


3 Notations: Direct or Interpretive Program Ex- 
ecution 


What do we mean by a program (roughly)? An answer: a set of instructions 
that specify a series (or set) of actions on data. Actions are carried out 
when the instructions are executed (activated,...) Further, a program is 
software, not hardware. Thus a program should itself be a concrete data 
object that can be replaced to specify different actions. 


Direct program execution: write [program] to denote the meaning 
or net effect of running program. A program meaning is often a function 
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from data input values to output values. Expressed symbolically: 
[program]|(data;,,) = datagut 


The program is activated (run, executed) by applying the semantic function 
[-]. The task of programming is, given a desired semantic meaning, to find a 
program that computes it. Some mechanism is needed to execute program, 
i.e., to compute [program]. This can be done either by hardware or by 
software. 


Interpretive program execution: Here program is a passive data 
object, but it is now activated by running the interpreter program. (Of 
course, some mechanism will be needed to run the interpreter program, e.g., 
hardware or software.) An equation similar to the above describes the effect 
of interpretive execution: 


interpreter||(program, data,,,) = dataoy+ 
P prog in 


Note that program is now used as data, and not as an active agent. Self- 
interpretation is possible and useful [28]; the same value dataoyz can be 
computed by: 


interpreter||(interpreter, (program, data. = dataguyt 
Pp p prog in 


4 ‘Turing Completeness of Computational Models 


How to show Turing completeness of a computation framework. 
This is typically shown by reduction from another problem already known 
to be Turing complete. Notation: let ZL and M denote languages (biological, 
programming, whatever), and let [p]” denote the result of executing L- 
program p, for example an input-output function computed by p. Then we 
can say that language M is at least as powerful as L if 


Vp € L—programs 4g € M—programs ( [p]” a [a” ) 


A popular choice is to let L be some very small Turing complete language, 
for instance Minsky register machines or two-counter machines (2CM). The 
next step is to let M be a biomolecular system of the sort being studied. 
The technical trick is to argue that, given any L-instance of (say) a 2CM 
program, it is possible to construct a biomolecular M-system that faithfully 
simulates the given 2CM. 
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Oddly enough, Turing completeness is not often used to show that cer- 
tain problems can be solved by M-programs; but rather only to show that, 
say, the equivalence or termination problems of M-programs are algorithmi- 
cally undecidable because they are undecidable for L, and the properties are 
preserved under the construction. This discussion brings up a central issue: 


Simulation as opposed to interpretation. Arguments to show 
Turing completeness are (as just described) usually by simulation: for each 
problem instance (say a 2CM) one somehow constructs a biomolecular system 
such that ...(the system in some sense solves the problem). However, in 
many papers for each problem instance the construction of the simulator is 
done by hand, e.g., by the author writing the article. In effect the existential 
quantifier in Vpaq([p]” = [[q)) is computed by hand. This phenomenon 
is clearly visible in papers on cellular computation models: completeness is 
shown by simulation rather than by interpretation. 


In contrast, Turing’s original “Universal machine” simulates by means 
of interpretation: a stronger form of imitation, in which the existential 
quantifier is realised by machine. Turing’s “Universal machine” is capable 
of executing an arbitrary Turing machine program, once that program has 
been written down on the universal machine’s tape in the correct format, 
and its input data has been provided. Our research follows the same line, 
applied in a biological context: we show that simulation can be done by 
general interpretation, rather than by one-problem-at-a-time constructions. 


5 Programs in a Biochemical World 


Our goal is to express programs in a biochemical world. Programming 
assumptions based on silicon hardware must be radically re-examined to 
fit into a biochemical framework. We briefly summarize some qualitative 
differences. 


e There can be no pointers to data: addresses, links, or unlimited 
list pointers. In order to be acted upon, a data value must be physically 
adjacent to some form of actuator. A biochemical form of adjacency: 
a chemical bond between program and data. 


e There can be no action at a distance: all effects must be achieved 
via chains of local interactions. A biological analog: signaling. 
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e There can be no nonlocal control transfer, e.g., no analog to 
GOTOs or remote function/procedure calls. However some control 
loops are acceptable, provided the “repeat point” is (physically) near 
the loop end. A biological analog: a bond between different parts of 
the same program. 


e On the other hand there exist available biochemical resources to tap, 
i.e., free energy so actions can be carried out, e.g., to construct local 
data, to change the program control point, or to add local bonds into 
an existing data structure. Biological analogs: Brownian movement, 
ATP, oxygen. 


The above constraints suggest how to structure a biologically feasible model 
of computation. The main idea is to keep both program control point and 
the current data inspection site always close to a focus point where all actions 
occur. This can be done by continually shifting the program or the data, 
to keep the active program blob (APB) and active data blob (ADB) always 
in reach of the focus. The picture illustrates this idea for direct program 
execution. 
Running program p, i.e., computing [[p](d) 


Program p Datad 


| | = Focus point for control and data 
(connects the APB and the ADB) 


=  program-to-data bond 


5.1 The Blob Model 


We take a very simplified view of a (macro-)molecule and its interactions, 
with abstraction level similar to the Kappa model [13, 8, 15]. To avoid 
misleading detail questions about real molecules we use the generic term 
“blob” for an abstract molecule. A collection of blobs in the biological “soup” 
may be interconnected by two-way bonds linking the individual blobs’ bond 
sites. 

A program p is (by definition) a connected assembly of blobs. A data 
value d is (also) by definition a connected assembly of blobs. At any moment 
during execution, i.e., during computation of [p]](d) we have: 


Programming in Biomolecular Computation: 
Programs, Self-Interpretation and Visualisation 81 


e One blob in p is active, known as the active program blob or APB. 
e One blob in d is active, known as the active data blob or ADB. 


e A bond *, between the APB and the ADB, is linked at a specially 
designate bond site, bond site 0, of each. 


The data view of blobs: A blob has several bond sites and a few 
bits of local storage limited to fixed, finite domains. Specifically, our model 
will have four bond sites, identified by numbers 0,1,2,3. At any instant 
during execution, each can hold a bond — that is, a link to a (different) blob; 
or a bond can hold L, indicating unbound. 

In addition each blob has 8 cargo bits of local storage containing Boolean 
values, and also identified by numerical positions: 0,1,2,...,7. When used 
as program, the cargo bits contain an instruction (described below) plus an 
activation bit, set to 1. When used as data, the activation bit must be 0, but 
the remaining 7 bits may be used as the user wishes. (A biological analog to 
bits 1 or 0 is “phosphorylated” or “unphosphorylated” .) 

A blob with 3 bond sites bound and one unbound: 


ale 


Since bonds are in essence two-way pointers, they have a “fan-in” restriction: 
a given bond site can contain at most one bond (if not 1). 

The program view of blobs: Blob programs are sequential. There 
is no structural distinction between blobs used as data and blobs used as 
program. A single, fixed set of instructions is available for moving and 
rearranging the cursors, and for testing or setting a cargo bit at the data 
cursor. Novelties from a computer science viewpoint: there are no explicit 
program or data addresses, just adjacent blobs. At any moment there is 
only a single program cursor and a single data cursor, connected by a bond 
written * above. 

Instructions, in general. The blob instructions correspond roughly 
to “four-address code” for a von Neumann-style computer. An essential 
difference, though, is that a bond is a two-way link between two blobs, and 
is not an address at all. It is not a pointer; there exists no address space 
as in a conventional computer. A blob’s 4 bond sites contain links to other 
instructions, or to data via the APB-ADB bond *. 


82 L. Hartmann, N.D. Jones, J.G. Simonsen, S.B. Vrist 


For program execution, one of the 8 cargo bits is an “activation bit”; 
if 1, it marks the instruction currently being executed. The remaining 
7 cargo bits are interpreted as a 7-bit instruction so there are 2’ = 128 
possible instructions in all. An instruction has an operation code (around 15 
possibilities), and 0, 1 or 2 parameters that identify single bits, or bond sites, 
or cargo bits in a blob. See table below for current details. For example, 
SCG v c has 16 different versions since v can be one of 2 values, and c can 
be one of 8 values. 


Why exactly 4 bonds? The reason is that each instruction must have a 
bond to its predecessor; further, a test or “jump” instruction will have two 
successor bonds (true and false); and finally, there must be one bond to link 
the APB and the ADB, i.e., the bond * between the currently executing 
instruction and the currently visible data blob. The FIN instruction is a 
device to allow a locally limited fan-in. 


A specific instruction set (a bit arbitrary). The formal semantics 
of instruction execution are specified precisely by means of a set of 128 
biochemical reaction rules in the style of [13]. For brevity here, we just list 
the individual instruction formats and their informal semantics. Notation 
for instruction parameters: b is a 2-bit bond site number, c is a 3-bit cargo 
site number, and v is a 1-bit value. 


Numbering convention: the program APB and the data ADB are linked 
by bond * between bond sites 0 of the APB and the ADB. An instruction’s 
predecessor is linked to its bond site 1; bond site 2 is the instruction’s normal 
successor; and bond site 3 is the alternative “false” successor, used by jump 
instructions that test the value of a cargo bit or the presence of a bond. 


On the insert instruction INS. This creates a new blob, linked with 
the current ADB. Analogy: thinking of a blob as a cell or molecule (whichever 
paradigm seems natural), we are implicitly assuming that blobs are swimming 
in a biological soup”, so INS just reconfigures a nearby element. From one 
viewpoint, this action resembles detaching a new cell from the freelist (the 
list of available cells used in Lisp/Scheme implementations). 
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Instruction | Description Informal semantics (:=: is a two-way interchange) 
scG vc Set CarGo bit ADB.c := vy; APB := APB.2 
JCG c Jump CarGo bit if ADB.c = 0 
then APB := APB.3 else APB := APB.2 
JB b Jump Bond if ADB.b = L 
then APB := APB.3 else APB := APB.2 

CHD b CHange Data ADB := ADB.b; APB := APB.2 
INS b1 b2 INSert new bond new.b2 :=: ADB.b1; 

new.b1 :=: ADB.b1.bs; APB := APB.2 

“new” is a fresh blob, “bs” is 

the bond site that ADB.b1lwas 

bound to before INS b1 b2. 
SWL b1 b2 SWap Links ADB.b1 :=: ADB.b2.b1; APB := APB.2 
SBS bi b2 SWap Bond Sites ADB.b1 :=: ADB.b2; APB := APB.2 
SWP1 b1 b2 | Swap bs1 on linked ADB.b1.1 :=: ADB.b2.1; APB := APB.2 
SWP3 b1 b2 | Swap bs3 on linked ADB.b1.3 :=: ADB.b2.3; APB := APB.2 
JN b1 b2 Join b1 to linked b2 ADB.b1 :=: ADB.b1.b2; APB := APB.2 
DBS b Destination bond site | Cargo bits 0,1 := bond site 

of destination for ADB.b 
FIN Fan IN APB := APB.2 
EXT EXiT program 


On the need for a fan-in instruction. The point with FIN (short 
for“fan-in”) is that in blob code, unlike say Scheme, there cannot exist 
an unbounded number of pointers to a given blob (since every pointer 
corresponds to a bond site, and every blob has only 4 bond sites). So to 
achieve the effect of, say, 5 pointers to a blob instruction in a program, one 
can use a fan-in tree where each blob in the tree has at most 4 bond sites. 

An example in detail: the instruction SCG 1 5, as picture and 
as a rewrite rule. SCG stands for “set cargo bit”. The effect of instruction 
SCG 1 5 is to change the 5-th cargo bit of the ADB (active data blob) to 1. 
First, an informal picture to show its effect: 


Program 


Data 


Program 


Data 


Note: the APB-ADB bond * has moved: Before execution, it connected 
APB with ADB. After execution, it connects APB’ with ADB, where APB’ 
is the next instruction: the successor (via bond S) of the previous APB. Also 
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note that the activation bit has changed: before, it was 1 at APB (indicating 
that the APB was about to be executed) and 0 at ADB’. Afterwards, those 


two bit values have been interchanged. a SCG ve 


Syntax: Code the above instruction as an 8-bit string: 1 100 1 101. 
Here activation bit a = 1 indicates that this is the current instruction (about 
to be executed). Operation code SCG (happens to be) encoded as 100; and 
binary numbers are used to express the new value: v = 1, and the number 
of the cargo bit to be set: c= 5. 

The instruction also has four bond sites: *P.S 1. Here P is a bond to 
the predecessor of instruction SCG 1 5, S is a bond to its successor, and 
bond site 3 is not used. The full instruction, with 8 cargo sites and four 
bond sites can be written in form”: B[11001101](*PS 1). 

Semantics: Instruction SCG 1 5 transforms the three blobs APB, 
APB’ and ADB as in the picture above. This can be expressed more exactly 
using a rewrite rule as in [13] that takes three members of the blob species 
into three modified ones. For brevity we write “-” at bond sites or cargo 
sites that are not modified by the rule. Note that the labels APB, ADB, etc. 
are not part of the formalism, just labels added to help the reader. 


APB APB' ADB 
B[l 100 1 101](*-S-), B[O--- ---- \(LS--), BlO----a--](*---) 
=> 
B{0 1001 101)(L-S-), B[l------- G52), “Bidsea<1s<]Ge+<) 
APB —apeOC~CSS”S”CSA BS 


6 The Blob World from a Computer Science Per- 
spective 


First, an operational image: Any well-formed blob program, while running, 
is a collection of program blobs that is adjacent to a collection of data blobs, 
such that there is one critical bond (*) that links the APD and the ADB 
(the active program blob and the active data blob). As the computation 
proceeds, the program or data may move about, e.g., rotate as needed to 
keep their contact points adjacent (the APB and the ADB). For now, we 
shall not worry about the thermodynamic efficiency of moving arbitrarily 
large program and data in this way; for most realistic programs, we assume 


? B stands for a member of the blob “species”. 
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them to be sufficiently small (on the order of thousands of blobs) that energy 
considerations and blob coherence are not an issue. 


6.1 The Blob Language 


It is certainly small: around 15 operation codes (for a total of 128 instructions 
if parameters are included). Further, the set is irredundant in that no 
instruction’s effect can be achieved by a combination of other instructions. 
There are easy computational tasks that simply cannot be performed by any 
program without, say, SCG or FIN. 

There is certainly a close analogy between blob programs and a rudi- 
mentary machine language. However a bond is not an address, but closer 
to a two-way pointer. On the other hand, there is no address space, and 
no address decoding hardware to move data to and from memory cells. An 
instruction has an unusual format, with 8 single bits and 4 two-way bonds. 
There is no fixed word size for data, there are no computed addresses, and 
there are no registers or indirection. 

The blob programs has some similarity to LISP or SCHEME, but: there 
are no variables; there is no recursion; and bonds have a “fan-in” restriction. 


6.2 What Can Be Done in the Blob World? 


In principle the ideas presented and further directions are clearly expressible 
and testable in Maude or another tool for implementing term rewriting 
systems, or the kappa-calculus [8, 10, 13, 15]. Recent work involves program- 
ming a blob simulator, and execution visualiser. Prototype implementations 
of both have been made, describes in Section 7. 

The usual programming tasks (appending two lists, copying, etc.) can 
be solved straightforwardly, albeit not very elegantly because of the low level 
of blob code. Appendix A shows how to generate blob code from a Turing 
machine, thus establishing Turing-completeness. 

It seems possible to make an analogy between universality and self- 
reproduction that is tighter than seen in the von Neumann and other cellular 
automaton approaches. It should now be clear that familiar Computer 
Science concepts such as interpreters and compilers also make sense also at 
the biological level, and hold the promise of becoming useful operational and 
utilitarian tools. 
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6.3 Self-interpretation in the Blob World 


The figure of Section 5 becomes even more interesting when a program is 
executed interpretively, computing [interpreter]|(p, d). 


Interpreter Program p 


The interpreter’s data is p and d together 


Data d 


We have designed developed and implemented a “blob universal machine” , 
i.e., a self-interpreter for the blob formalism that is closely analogous to 
Turing’s original universal machine. More details appear in Section 7.1. 


6.4 Parsimony of the Instruction Set 


All instructions are currently in use in the self-interpreter, indeed all instruc- 
tions appeared to be necessary in programming it. With the possible (but, 
we believe, unlikely) exception of the various swap instructions (SWL, SBS, 
SWP1, SWP3), we conjecture the instruction set to be parsimonious in the 
sense that no proper subset of the instruction set can be used to simulate 
the remaining instructions. A possible formal proof is being investigated. 


6.5 Dimensionality Limitations 


The physical world imposes a dimensionality requirement we have not yet 
addressed: data and program code cannot be packed with a density greater 
than that allowed by three-dimensional Euclidean space. The idea of a 
biologically plausible computing model that must work in 3-space provokes 
several interesting questions. 

In the blob model, following a chain of & bonds from the active data 
blob (at any time in a computation) should give access to at most O(k?) 
blobs. This is not guaranteed by the blob model as presented above; indeed, 
a blob program could build a complete 3-ary tree of depth k and containing 
3* blobs at distance k. This structure could not be represented in 3-space 
with our restrictions, and still have the intended semantic structure: that 
any two blobs linked by a bond should be adjacent in the biological “soup”. 

The usual Turing machine has a fixed number of 1-dimensional tapes 
(though k-dimensional versions exist, for fixed k). Cellular automata as in 
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[44, 9, 47] have a fixed 2-dimensional architecture. Dimensionality questions 
are not relevant to Minsky-style machines with a fixed number of registers, 
e.g., the two-counter machine. 


Machines that allow computed addresses and indirection, e.g., the RAM, 
RASP, etc., have no dimensionality limitations at all, just as in the “raw” 
blob model: traversing a chain of k bonds from one memory can give access 
to a number of cells exponential in & (or higher if indexing is allowed). 


The well-known and well-developed Turing-based computational com- 
plexity theory starts by restricted programs’ running time and/or space. An 
possible analogy would be to limit the dimensionality of the data structures 
that a program may build during a computation. 


Pursuing the analogy, the much-studied complexity class PTIME is quite 
large, indeed so large that dimensionality makes no difference: on any 
traditional model where data dimensionality makes sense, it would be an 
easy exercise to show that PTIME = PTIME3D. What if instead we study 
the class LINTIME of problems solvable in linear time (as a function of input 
size)? Alas, this smaller, realistically motivated class is not very robust for 
Turing machines, as small differences in Turing models can give different 
versions of LINTIME (Sections 18, 19, 25.6 in [29]). It seems likely though 
that the LINTIME class for blob machines is considerably more robust. 


Conjecture: LINTIME3D C LINTIME on the blob model. 


Another interesting question: does self-interpretation cause a need 
for higher dimensionality? We conjecture that this is not so for any one 
fixed interpreted program; but that diagonalisation constructions can force 
the necessary dimensionality to increase. This appears to be an excellent 
direction for future work. 


7 Implementation: A Self-Interpreter and Program 
Visualizer 


Two program tools have been developed for interaction with blobs: A 
self-interpreter written in the blob programming language itself, and a 
program visualiser written in Java. Both tools are available for download at 
http://blobvis.appspot.com 
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7.1 Self-interpreter 


The self-interpreter consists of approximately 50000 blobs, partially auto- 
matically generated. This seemingly large number of blobs is mostly due to 
fanins and the fact that state information has to be encoded directly in the 
interpreter (see below). 

The blob self-interpreter is written using a standard execution cycle: 


1. Dispatch on syntax: decode the current instruction ins at the ADB; 
2. Execute interpreter code to simulate ins; 

3. This will update the program counter, and may modify data; 

4. Repeat the cycle, if ins was not EXT. 


The point of control of the interpreter is its APB. The APB is (as always) 
connected the ADB. This has two bonds, one to point of control of the 
program being interpreted, and one to the ADB of the interpreted program. 

Differences from traditional interpreters: (A) There are only 128 possible 
instructions, including all values of instruction parameters; (B) the blob 
language does not afford the programmer access to (symbolic) variables. To 
circumvent the lack of variables we encode state information for the self 
interpreter implicitly in its control point. 

Whenever a variable would be used to hold state information in a 
traditional programming language, we instead create the equivalent of a 
switch statement, where each branch corresponds to one bit of the state 
information. Only a constant number of bits needs to be stored at a given 
time. 

A bird’s eye view of the self-interpreter is given in Figure 3- The clear 
branching structure progressing from the center to the periphery is due to a 
combination of the dispatch on syntax in the main execution cycle, and the 
need to encode state information as described above. 


7.2 A Tool for Blob Visualisation 


The main purpose of the visualiser is to provide an overview of a given blob 
program and its execution. 
The most pertinent features of the visualiser: 


e Interactive graph layout of blob programs and data, using a force- 
directed physics model. 
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e Ability to perform program execution one step at a time, and as an 
animation. 


e The ability to add and remove data blobs at run time. 


e The ability to export program state images before, during, and after 
execution. 


We use a force directed physics-based graph layout algorithm, based on a 
combination of classical spring embedding[16, 20, 30], gravitational/magnetic 
field heuristics, and damped “drag force” to minimize fluttering/19, 38]. The 
graph layout follows standard methods for providing aesthetically pleasing 
layouts [22], employing the Prefuse toolkit [26]. 


B dikuBlob -BlobVis 


¢ 
Blob Model 


Forbsiniation ctr 
“Sf force simudaton 


Figure 1: The main interface of the blob visualiser. 1) visualisation area, 2) 
Zoom buttons, 3) Connectivity filter, 4) force-directed layout start /pause 
button, 5) blob program controls. Program blobs are green, data blobs red; 
the APB and ADB are emphasized by brighter colours and thicker bond 
lines 


Some examples: Figure 1 shows the visualiser’s main interface. Figure 
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2 shows a simple program appending two (blob representations of) lists. 


Figure 2: The “ListAppend” program. This visualisation shows the pro- 
gram’s two loops. Using the visualisation tool it is possible to experiment 
with larger lists by dynamically adding more data at run time. 


Self-interpreter The self-interpreter of Section 7.1 is shown in Figures 
3 and 4. 
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f-Interpreter limited to showing only blobs 


at most 25 bonds removed from starting APB. xs 


Figure 3: Visualisation of the Sel 
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Figure 4: Visualisation of the Self-Interpreter with all blobs visible. The 
looping structure in the middle dominates the picture. 
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8 Contributions of This Work 


We have for the first time investigated the possibility of programmable bio- 
level computation. The work sketched above, in particular the functioning of 
blob code, can all be naturally expressed in the form of abstract biochemical 
reaction rules. Further, we have shown molecular computation to be universal 
in a very strong sense: Not only can every computable function be computed 
by a blob program, but this can all be done using a single, fixed, set of 
reaction rules: it is not necessary to resort to constructing new rule sets (in 
essence, new biochemical architectures) in order to solve new problems; it is 
enough to write new programs. 

The new framework provides Turing-completeness efficiently and with- 
out asymptotic slowdowns. It seems possible to make a tighter analogy 
between universality and self-reproduction than by the von Neumann and 
other cellular automaton approaches. 

It should be clear that familiar Computer Science concepts such as 
interpreters and compilers also make sense also at the biological level, and 
hold the promise of becoming useful operational and utilitarian tools. 

We have implemented a prototype system for experimenting with pro- 
gramming the blob model. The visualisation tool can be accessed on the 
web at http://blobvis.appspot.com/. 


9 Directions for Future Work 


We believe the natural next steps to be the following, both for the blob 
framework specifically, and for programming in biomolecular settings in 
general. 


e Investigate the ”load-and-go” capability of stored programs. 


e Do self-application, e.g., compiling as well as interpretation. 


Develop and implement a self-reproducing program. 


Develop fair cost models, e.g., for the time to execute one instruction. 


Understand limitations imposed by (blob) adjacency, and restriction 
of programs and data to 2 or 3 dimensions. 


In addition to the above, there are a number of interesting directions 
to pursue: 


94 L. Hartmann, N.D. Jones, J.G. Simonsen, S.B. Vrist 


e While our model can support full parallelism (as often seen in biologically- 
inspired computing), the main focus of this paper is on completeness 
and universality, so for simplicity we have only considered one program 
running on one piece of data (a connected set of blobs). 


While parallelism would be very natural, in our opinion adding it too 
early it would in effect open Pandora’s box: suddenly lots of questions 
natural to concurrency models would arise, distracting attention from 
the interesting points seen above. 


e Biological operations seem to support “probabilistic” setups even more 
easily than deterministic binary computations. We understand that 
biological operations in effect set up redundant computations to deal 
with the noise or chatter of imprecise biological computation. 


We have ignored this question for now, but it would need to be answered 
in order to achieve more biological plausibility. 


e Allow self-modifying programs. This is natural (in fact inevitable) 
since there is no structural distinction between blobs used as data and 
blobs used as program. 


e Partial evaluation, e.g., specialisation of biological objects to relatively 
unchanging environments (one thinks of viruses). The aim would be 
to obtain efficiency increases (e.g., faster reaction times). Efficiency 
improvement by program specialisation has been studied extensively 
for traditional programming languages, e.g., see [28]. 
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A Turing completeness of the blob model 


We prove that any one-tape Turing machine with a single read/write head 
may be simulated by a blob program. The tape contents are always finite 
and enclosed between a left endmarker < and a right endmarker >. 


A.1 Turing machine syntax 


A Turing machine is a tuple Z = ({0,1}, Q,0, dstart; Uhalt)- The tape and 
input alphabet are {0,1}. (Blanks are not included, but may be encoded 
suitably by bits.) Q is a finite set of control states including distinct start 
and halting states dstart, Ghat © Q. The transition function has type 


6:{0,1,<,e}xQ7>AxQ 


where an action is any A € A= {L,R,W0,W1}. Notation: we write a 
Turing machine instruction as 


6(q,b) > (A,r) 


meaning “In state q, reading bit b, perform action A and move to state 
r”. Actions L,R,W0,W1 mean informally “move Left, move Right, Write 
0, Write 1”, respectively. For simplicity we assume that Turing machines 
may not both move and write on the tape in the same atomic step. (A 
“write-and-move” action may easily be implemented using two states and two 
steps.) 

We also assume that every Turing machine satisfies the following con- 
sistency assumptions: 


e If 5(¢g,<) — (A,r) is an instruction, then A = R (i.e. the machine 
never moves to the left of the left endmarker and cannot overwrite the 
endmarker). 


e If d(q¢,>) — (A,r) then A € {L,W0,W1} (i.e. the machine never 
moves to the right of the right endmarker, but can overwrite the 
endmarker). 


Definition 1 Let M be a Turing machine. The state graph of M is the 
directed graph where the nodes are the states of M and there is a directed 
edge from q to r annotated (b, A) if there is an instruction 6(q,b) — (A,r). 
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A.2 Turing machine semantics 


A total state has the form 


<b)... 0 ...b, & 


where the b; are tape symbols, and q is a control state. We define the tape 
contents of the machine to be everything enclosed between <J and >. 

The Turing machine defines a one-step transition relation between total 
states in the expected way (not spelled out here). Tapes may only grow 
to the right, not the left. We assume that if there is an instruction of the 
form 6(q,>) — (W0,r) or 6(¢,>) > (W1,r) (i.e. the right endmarker is 
overwritten), then the tape is automatically extended to the right with a 
new endmarker to the immediate right of the previous endmarker. 

Remark: the tape contents will always be finite after a finite number of 
computation steps. 

Input/Output: A Turing machine Z computes a partial function 


[2] = {0,1}° — {0, 1}° 


e Input: The machine is in its start state with the tape head on the tape 
cell to the immediate right of the left endmarker <. The input is the 
contents of the tape. 


e Output: The machine is in its halt state. The output is the contents 
of the tape. 


A.3 Compiling a Turing machine into a blob program 


We describe a way to compile any Turing machine Z = ({0, 1}, Q, 0, dstarts Chalt) 
into blob program code code(Z) that simulates it. Compilation of a Turing 
machine into blob code is as follows: 


e Generate blob code for each instruction 6(q,b) — (A,r). 


e Collect blob code for all the states into a single blob program. 


Before describing the compilation algorithm, we explain how the blob code 
realises a step-by-step simulation of the Turing machine Z. 
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A.3.1 Turing machine representation by blobs 


At any time ¢ in its computation, the Turing machine’s tape b)...5;... bn 
will represented by a finite sequence B,...B;...By of blobs. If at time ¢ the 
Turing machine head is scanning tape symbol 0;, the active data blob will 
be the blob B;. Arrangement: each B; is linked to its predecessor via bond 
site 1, and to its successor via bond site 2. The Turing machine’s control 
state will correspond to the active program blob in code(Z). 

The cargo bits of the “data blobs” are used to indicate the contents of 
the the tape cell: 


Cargo bit 0 is unused in the simulation. 


Cargo bit 1 is used to hold the bit occupied by the tape cell (if the 
blob represents either < or >, the contents of cargo bit 1 is irrelevant). 


Cargo bit 2 is ’1’ iff the blob represents the Jeff endmarker <. 


Cargo bit 3 is ’1’ iff the blob represents the right endmarker >. 


A.3.2 Syntax of the generated code 


We will write the generated blob target program as straightline code with 
labels. For every instruction, the “next” blob code instruction to be executed 
is the one linked to the active program blob by the latter’s “successor” bond 
site 2. Thus, in 


SCG 0 5 
EXT 


the blob corresponding to SCG 0 5 has its bond site 2 linked to the “prede- 
cessor” bond site 1 of the blob corresponding to EXT. 


A.3.3 Code generation for each state 


Let ¢ 4 dhait be a state. The four possible kinds of transitions on state q are: 


6(q,0) — (A0,q0) 
6(q,1) — (A1,q1) 
d(q,<) — (AL,qL) 


6(q¢,>) — (AR,qR) 
where g0, q1,qLl,qR € Q, AO, Al € {L, R,W0,W1}, and AL, AR e€ {L,W0,W1}. 
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We generate code for q as follows. For typographical reasons, < = EL 
and > = ER. The action code notations [AO] etc, is explained below, as is 
the label notation <label>. The initial FIN code may be safely ignored on 


the first reading. 


Generate i-1 FIN 


Q: JCG 2 QLE 


JCG 3 QRE 

JCG 1 Q1 

[Ao] 

FIN qA0q0O 
Qi: [At] 

FIN qAiqi 
QLE: [AL] 


FIN qELALqL 


QRE: RAR] 
FIN ERARgR 


Code for qpatz: 


Generate i-1 FIN 


Qh: EXT 


// 
// 
// 
// 


// 
// 
// 
// 
// 
// 
// 
// 
// 
// 


// 
// 


// 
// 


// 
// 


// 


// 
// 
// 
// 


Assume program port 2 is always "next" operation 
Each FIN is labeled as noted below 

The last FIN is bound (on its bond site 2) to 
the blob labeled ’Q’ below. 


If 1, We’re at left tape end 

By convention, bond site 3 of the APB is 

bound to the blob labeled QLE 

If 1, We’re at right tape end 

We’re not at any end. If ’0’ is scanned, move along 
(on bond site 2), 

otherwise a ’1’ is scanned, jump to Qi 

(on bond site 3) 

Insert code for action AO 

Go to appropriate fanin before qO (on bond site 2) 


Insert code for action Al 
Go to appropriate fanin before qi (on bond site 2) 


Insert code for AL 
Go to appropriate fanin before qL (on bond site 2) 


Insert code for AR (with the R[ ]-function) 
Go to appropriate fanin before qR (on bond site 2) 


Code for q end 


Assume program port 2 is "next" operation always 
Each FIN is labeled as noted below 

The last FIN is bound (on its bond site 2) to 
the blob labeled ’Qh’ below. 


The JCG instructions test the data blob B; to see which of the four 
possible kinds of transitions should be applied. Codes [AO], [A1], [AL], 
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R[AR] simulate the effect of the transition, and the FIN after each in effect 
does a “go to” to the blob code for the Turing machine’s next state. (This 
is made trickier by the fan-in restrictions, see Section A.3.7 below.) 


A.3.4 Two auxiliary functions 


We use two auxiliary functions to generate code: 


[] : {L, R, W0,W1} —> blobcode 


and 
R[] : {L,W0,W1} —> blobcode 


Function [] is used for code generation on arbitrary tape cells, and 
R[] for code generation when the Turing machine head is on the right end 
marker where some housekeeping chores must be performed due to tape 
extension. 


A.3.5 Code generation for instructions not affecting the right 
end of the tape 


[wo] 


SCG 0 1 // Set tape cell content to 0 


(W1] 


scG 1 1 // Set tape cell content to 1 


[Z] 


CHD 1 // Set ADB to previous blob (move tape left) 


[R] 


CHD 2 // Set ADB to next blob (move tape right) 
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A.3.6 Code generation for instructions that can extend the tape 


SCG 0 // 
INS 2 1 // 
// 

CHD 2 // 
SCG 13 // 
CHD 1 // 
SCG O1 // 
SCG 0 // 
INS 21 // 
// 

CHD 2 // 
SCG 13 // 
CHD 1 // 
SCG 11 // 
R(L] = [L] // 
// TM does 


R[WO] 


Current blob is no longer at right tape end 
Insert new blob at bond port 2 on ADB 

(new tape cell). New blob is bound at site 1. 
Change ADB to new blob (move head right) 
New blob is at the right end of the tape 
Change ADB to original blob (move head left) 
Write a ’0’ in the tape cell (as per WO). 


R(W1] 


Current blob is no longer at right tape end 
Insert new blob at bond port 2 on ADB 

(new tape cell). New blob is bound at site 1 
Change ADB to new blob (move head right) 
New blob is right tape end 

Change ADB to original blob (move head left) 
Write a ’1’ in the tape cell (as per W1) 


R[L] 


Move to the left 
not move right at right tape end. 


A.3.7 Control flow in the generated blob code 


A technical problem in code generation. We now explain the meaning 
of the somewhat cryptical comments such as “Go to appropriate fanin 
before qi” in Section A.3.3, and notations such as qA0q0 

The problem: while a pointer-oriented language allows an unbounded 
number of pointers into the same memory cell, this is not true for the 
blob structures (the reason is that a bond is intended to model a chemical 
connection between two molecules). This is a “fan-in” restriction on program 
(and data) syntax. 

A consequence: blob program code may not contain more than one 
control transfer to a given instruction, unless this is done by a bond site 
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different from the usual “predecessor” site 1. The purpose of the instruction 
FIN is to allow two entry points: one as usual by bond site 1, and a second 
by bond site 3. 

The initial FIN code generated of Section A.3.3. This concerns 
the entry points into blob code for a Turing state g. Let 7 be the number of 
directed edges to q in the state graph (i.e., the number of “go to’s” to q). 

If i < 1, we generate no fanin blobs. 

Otherwise, we generate i — 1 fanin blobs before the code generated for 
q; these handle the 7 transitions to g. The blobs bound to the fanin nodes 
occur in the code generated for other states (perhaps from q to itself). For 
each transition 5(q’,b) > (A,q), a blob in the code generated for q’ is bound 
to a single fanin blob for g. The fanin blob generated above, before the 
generated code for state q, is labeled by q’ bAq. 
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